Introduction Programming hidden Markov models

نویسنده

  • Chris Bystroff
چکیده

Since their formulation by Andrei Markov in 1906 [7], Markov chains (MC) and hidden Markov models (HMM) have found a place in diverse fields of science and engineering, from speech recognition to weather prediction to protein sequence alignment. Wherever a data set can be expressed as a string of discrete symbols, and when the data has a common source or common underlying principle, then for those cases a hidden Markov model may be designed to extract that underlying principle. A Markov chain is a directed graph where the nodes are Markov states and the edges are directed state-state transitions. A Markov state is said to ‘emit’ a symbol, which is unique to that state. A “hidden” Markov model (also sometimes called a “state space” or “latent Markov” model) differs from a MC in that each state emits one of a set of symbols drawn from a distribution; different hidden states may emit the same symbol, and non-emitting states are possible. The term ‘hidden’ is used because the symbol sequence alone does not tell us the state sequence directly. Instead, the latter must be inferred. In a HMM the states have a meaning all their own, separate from the meaning of the symbols they emit. For example, in a HMM composed of states that emit temperature readings, the states themselves may represent precipitation readings, or wind direction, or seasons, or all of the above. HMM states are classifiers of the symbols in the data string(s), their types and their contexts. Algorithms for computing the probabilistic fit between a data string, or a set of strings, and a HMM have long-since been worked out. The groundbreaking work of L.E. Baum in the 1960’s led to the expectation-maximization (EM) method for locally optimizing HMM parameters. In 1967, Andrew Viterbi wrote a general algorithm for finding the optimal state pathway given a sequence [8]. Lawrence Rabiner’s highly-cited 1989 tutorial [6] outlined the “Three Basic Problems” for HMMs (see box), and brought these techniques within reach of scientists not traditionally trained in probability theory, including even biologists (who then became known as “bioinformaticists”). Several good books on the subject are now available [1–5]. But the algorithmology of HMMs still has many unsolved problems, some of which are addressed in the current special issue. For example, the space of all possible directed graphs of size Q may be very, very large, far too large to be searched exhaustively. How do we find the optimal graph connectivity in a datadriven manner? How do we, simultaneously, define the number of states and initialize their emission probabilities, also in a data driven manner? One theoretical approach is presented in this issue (see the article by Li and Biswas). A related problem is to determine the Markov chain order, given only the data. In a firstorder chain, the transition probability depends only on the current state; in a second order model it depends also on the previous two states, and so on. An answer is given in this issue (see the article by Boys and Henderson). Also, how do we impose constraints from domain-specific knowledge on the HMM topology? In this issue we present the special cases of psychological data and speech recognition (see articles by Visser et al., and by Abdulla, respectively). The Three Basic Problems for HMMs [6]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Introducing Busy Customer Portfolio Using Hidden Markov Model

Due to the effective role of Markov models in customer relationship management (CRM), there is a lack of comprehensive literature review which contains all related literatures. In this paper the focus is on academic databases to find all the articles that had been published in 2011 and earlier. One hundred articles were identified and reviewed to find direct relevance for applying Markov models...

متن کامل

depmixS4: An R Package for Hidden Markov Models

This introduction to the R package depmixS4 is a (slightly) modified version of Visser and Speekenbrink (2010), published in the Journal of Statistical Software. Please refer to that article when using depmixS4. The current version is 1.3-3; the version history and changes can be found in the NEWS file of the package. Below, the major versions are listed along with the most noteworthy changes. ...

متن کامل

1 Hidden Markov Models with Finite State Supervision Eric

In this chapter we provide a supervised training paradigm for hidden Markov models (HMMs). Unlike popular ad-hoc approaches, our paradigm is completely general, need not make any simplifying assumptions about independence, and can take better advantage of the information contained in the training corpus.

متن کامل

Intrusion Detection Using Evolutionary Hidden Markov Model

Intrusion detection systems are responsible for diagnosing and detecting any unauthorized use of the system, exploitation or destruction, which is able to prevent cyber-attacks using the network package analysis. one of the major challenges in the use of these tools is lack of educational patterns of attacks on the part of the engine analysis; engine failure that caused the complete training,  ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002